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ABSTRACT 

The scripts of 288 television episodes were analysed to determine the extent to which vocabulary reoccurs in 
television programs from the same subgenres and unrelated television programs from different genres. Episodes 
from two programs from each of the following three subgenres of the American drama genre: medical, 
spy/action, and criminal forensic investigation were compared with different sets of random episodes. The 
results showed that although there were an equivalent number of running words in each set of episodes, the 
episodes from programs within the same subgenre contained fewer word families than random programs. The 
findings also showed that low frequency word families (4000-14,000 levels) reoccur more often in programs 
within the same subgenre. Together the results indicate that watching programs within the same subgenre may be 
an effective approach to language learning with television because it reduces the lexical demands of viewing and 
increases the potential for vocabulary learning. 

KEYWORDS: 

Comprehension, corpus linguistics, genre, incidental vocabulary learning, television, vocabulary coverage, word 
frequency. 

RESUMEN 

Los guiones de 288 episodios televisivos se analizaron para determinar el alcance de la recursividad del 
vocabulario en programas de television del mismo subgenero y en programas no relacionados de generos 
diferentes. Se compararon episodios de tres subgeneros del drama americano: medico, de espias/accion y de 
investigacion forense, con varios grupos de episodios elegidos al azar. Los resultados muestran que, aunque el 
numero de palabras en cada grupo de episodios era equivalente, los episodios del mismo subgenero contienen 
menos familias de palabras que aquellos elegidos al azar. Los hallazgos mostraron que las familias de baja 
frecuencia (niveles de 4.000-14.000) se repiten con mas frecuencia en los programas del mismo subgenero. En 
conjunto, los resultados indican que el visionado de programas del mismo subgenero puede ser un metodo 
efectivo para aprender el lenguaje por medio de la television porque reduce la demanda lexica de la proyeccion y 
aumenta el potencial de aprendizaje de vocabulario. 
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1. INTRODUCTION 

Television is a valuable resource for language learning. English language television programs 
are widely available in English as a second language (ESL) and English as a foreign language 
(EFL) contexts, and research indicates that foreign language learners are motivated to leam 
through watching television (Bada & Okan, 2000; Gieve & Clark, 2005). Research has also 
shown that L2 learners may incidentally learn vocabulary through watching television and 
short videos (d’Ydewalle & Pavakanun, 1995; d’Ydewalle & Van de Poel, 1999; Koolstra & 
Beentjes, 1999; Neuman & Koskinen, 1992; Pavakanun & d’Ydewalle, 1992), and that L2 
viewers may learn as many words incidentally through watching television as they would 
through reading a script of the program (Neuman & Koskinen, 1992). Television provides 
authentic L2 aural input which contributes to learning the spoken form of words and is thus a 
useful complement to learning through reading. 

In a corpus-driven study looking at the number of words needed to understand the 
vocabulary in television programs, Webb and Rodgers (2009a) found that a vocabulary size of 
3000 word families plus proper nouns and marginal words provided 95.45% coverage of a 
corpus made up of 88 television programs from a variety of genres. They reported that 
knowing the most frequent 3000 word families may be sufficient for adequate comprehension 
of television programs and that a learning approach which involved regular viewing could 
lead to large incidental vocabulary learning gains. They suggested that for learners with the 
appropriate vocabulary size, increased viewing should lead to increased vocabulary learning 
because research on incidental vocabulary learning has shown that the more unknown words 
are encountered in context, the more likely they are to be learned (Elorst, Cobb, & Meara, 
1998; Jenkins, Stein, & Wysocki, 1984; Rott, 1999; Saragi, Nation, & Meister, 1978; Waring 
& Takaki, 2003; Webb, 2007). 

Webb and Rodgers (2009a) findings also shed light on differences between television 
genres. Children’s programs were found to have the smallest vocabulary load; the most 
frequent 2000 word families, plus proper nouns and marginal words accounted for 95% 
coverage. The most frequent 3000 word families plus proper nouns and marginal words 
accounted for 95% of American drama, older programs, situation comedies and British 
programs. The genres with the greatest proportions of low frequency words were news stories 
and science fiction programs. Results also indicated that coverage is likely to vary between 
episodes of programs leading Webb and Rodgers to suggest that randomly viewing programs 
may limit comprehension. Instead they proposed watching programs from within the same 
subgenre that have similar topics and storyline. 

The aim of the present study is to examine the number of word types and word families 
in television programs from the same subgenre and unrelated television programs from 
different genres, as well as the number of encounters with low frequency words. Determining 
the number of word types and families provides some indication of the lexical demands of 
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text. This is useful because viewers are unlikely to watch L2 television programs that they 
cannot understand. Examining the number of encounters with low frequency words indicates 
the potential for vocabulary learning through reading or hearing a text. Comparing the number 
of encounters with low frequency words in different types of programs is also important 
because it may indicate how to effectively use television for language learning and provide 
direction towards optimizing vocabulary learning. 


2. BACKGROUND 

In corpus-driven research, few assumptions are made in advance of the analysis. Instead the 
patterns revealed in the corpus analysis provide the basis for subsequent descriptions of the 
corpus (Biber, 2009). Corpus-driven studies focused on vocabulary are well established and 
have looked at the vocabulary load and potential for incidental vocabulary learning in related 
and unrelated text (Hwang & Nation, 1989; Schmitt & Carter, 2000; Sutarsyah, Nation, & 
Kennedy, 1994), and the number of words necessary for comprehension of spoken discourse 
(Adolphs & Schmitt, 2003; Meara, 1991, 1993; Nation, 2006), written discourse (Meara, 
1993; Nation, 2006), television programs (Webb & Rodgers, 2009a), and movies (Webb & 
Rodgers, 2009b). Corpus-driven research has also focused on the potential for incidental 
vocabulary learning through encountering language in speech and writing (Cobb, 2007; Horst, 
2009; Meara, Lightbrown, & Halter, 1997, Webb, 2010a; Webb & Rodgers, 2009a; Wodinsky 
& Nation, 1988). 

Corpus-driven studies on narrow reading (reading texts with related content) have 
shown that when there are a similar number of running words in related and unrelated texts; 
related texts are likely to have a smaller lexical load than unrelated texts because there are 
fewer different words in related texts (Hwang & Nation, 1989; Schmitt & Carter, 2000; 
Sutarsyah, Nation, & Kennedy, 1994). Sutarsyah, Nation, and Kennedy (1994) examined the 
vocabulary in a single Economics text in comparison to a collection of academic texts with a 
similar number of running words. Their analysis showed that the vocabulary load of a single 
text is likely to be smaller than unrelated texts. The Economics text was made up of a much 
smaller number of types (9,469) and word families (5,438) than the academic texts (12,744 
and 21,399, respectively). 

Hwang and Nation (1989) examined the vocabulary in running stories (newspaper 
stories and their subsequent follow-up stories) and unrelated stories from newspapers. An 
analysis of 20 sets of four running stories and 20 sets of four unrelated stories showed that 
lower frequency word families (words outside of the 2000 most frequent word families) were 
encountered more often in the related texts than in the unrelated texts. This indicates that there 
is greater potential for vocabulary learning through reading related texts because research has 
consistently shown that as the number of encounters with unknown words in context 
increases, the words are more likely to be learned (Horst, Cobb, & Meara, 1998; Jenkins, 
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Stein, & Wysocki, 1984; Rott, 1999; Saragi, Nation, & Meister, 1978; Waring & Takaki, 
2003; Webb, 2007). Hwang and Nation suggested that narrow reading provided better 
conditions for learning low frequency words and reduced the vocabulary load by decreasing 
the number of times that L2 readers would need to look up words in dictionaries. 

A similar study by Schmitt and Carter (2000) looked at the vocabulary in a series of 
nine newspaper stories focused on the death of Princess Diana and the vocabulary in nine 
unrelated newspaper stories. Both sets of stories consisted of the same number of running 
words (7,843). The analysis showed that there were 156 fewer types and a greater number of 
encounters with those words in the related stories than in the unrelated stories. Schmitt and 
Carter concluded that reading related stories lowers the lexical load for L2 learners and may 
allow for earlier learning with authentic reading materials. 

The studies investigating narrow reading indicate that related texts are likely to have 
fewer word types and word families than unrelated texts, and a great number of encounters 
with the low frequency words in the texts. The studies of narrow reading have been useful 
because they provide direction on materials selection. Reading texts with similar topics may 
be effective because it lowers the lexical load and provides better conditions for vocabulary 
learning. 

In the only study that has compared related and unrelated television programs, Rodgers 
and Webb (in press) compared the vocabulary in episodes of the same television programs 
and in unrelated television programs. Approximately 24 episodes from each of six television 
programs from a single genre (American drama) were compared with sets of unrelated 
episodes from programs from a variety of genres. Both sets of episodes (related and unrelated) 
had the same number of running words. The results showed that there were fewer word 
families in each of the sets of episodes from a single program than in the sets of unrelated 
programs. Rodgers and Webb also found that low frequency word families (words from the 
4,000-14,000 BNC word lists) were encountered more often in each of the sets of episodes 
from the same program than the sets of unrelated episodes. The findings indicate that it may 
be more effective to watch different episodes of the same television program rather than 
episodes of different programs because the vocabulary load is likely to be lower when 
watching episodes of the same program. Although there are many factors that contribute to 
comprehension of a text, vocabulary has the largest effect (Laufer, 1989); if you do not know 
the words that occur in a text it is difficult to understand it. 

The present study expands on Rodgers and Webb’s (in press) research by investigating 
different television programs from the same subgenre rather than different episodes of the 
same program. The scripts of 142 episodes from six television programs from three subgenres 
of television drama - medical dramas, spy/action dramas, and criminal forensic investigation 
dramas were compared to three sets of random television programs (146 episodes). Each of 
the three subgenres was made up of a full season from two television programs 
(approximately 48 episodes in each). Each of the subgenres and the sets of random programs 
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had the same number of running words. The aim of the study is to determine whether related 
programs have a lower lexical load and a greater number of encounters with low frequency 
words than unrelated programs. The analysis may indicate an effective approach to using 
television for language learning by providing direction on how to reduce the lexical burden 
and optimize vocabulary learning. 


3. THE PRESENT STUDY 

3.1. Research questions 

The present study seeks to address the following research questions: 

1. Are there fewer word types and families in television programs from the same subgenre 
than in unrelated programs? 

2. Does vocabulary reoccur more often in different episodes of programs from the same 
television subgenre than in random television programs? 

3. How many word families are encountered 10 or more times in a full season of two 
television programs (approximately 48 episodes) within the same subgenre or an 
equivalent amount of random television episodes? 

3.2. Materials 

The scripts of 288 television episodes were analyzed in this study. The episodes had a total 
running time of 203 hours and 49 minutes and a mean running time of 42 minutes. The 
running time for all but seven of the programs was one hour including commercials; however, 
commercials were not included in the analysis. To compare the vocabulary within programs 
from the same subgenre and unrelated programs, 142 episodes from six different television 
programs: 24, Alias, Crossing Jordan, CSI, Grey’s Anatomy, and House, and 146 episodes 
from random programs were analysed in the study. The episodes for each of the programs 
made up one complete season. A season of a television series consists of a number of 
episodes broadcast in the same programming year, where an episode is a single instance of a 
series. The aim of the selection process for the six programs was to choose three pairs of 
programs that contained similarity in content within each pair, but differences between each 
pair. Comparing three pairs of related programs to sets of unrelated television programs may 
provide a valid representation of related and unrelated content in the American television 
drama genre. 

The set of 142 episodes were selected according to the following criteria: genre, 
subgenre, availability, running time, date when first aired, and place of origin. All six 
programs were classified as American dramas with the three pairs of programs belonging to 
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specific subgenres of American drama. The programs and their subgenres are shown in Table 

1 . 


Subgenre 

Program 

Medical dramas 

House 


Grey's Anatomy 

Criminal forensic investigation dramas 

CSI 


Crossing Jordan 

Spy/action dramas 

24 


Alias 


Table 1 . Subgenres and programs. 


House and Grey’s Anatomy were medical dramas, CSV and Crossing Jordan were 
criminal forensic investigation dramas, and Alias and 24 were spy/action dramas. These 
episodes had an average running time of 43 minutes and were first aired between 2001 and 
2006. The scripts were downloaded from the Internet. It is important to note that Internet- 
available television scripts do not always replicate dialogue with 100% accuracy. However, 
the scripts should provide a reliable assessment of the vocabulary in television programs. 

The 146 randomly selected television episodes were used for comparison with the 
episodes in the six programs. These episodes were all classified as American, originally aired 
between 1963 and 2009, and had running times of approximately 22 minutes (broadcast as a 
30 minute program including commercials) or 44 minutes (broadcast as a 60 minute 
program). Three sets of different episodes were created for comparison with the three 
subgenres. Programs from a wide range of genres were used in each set of comparison 
programs, and no set contained multiple episodes of the same program. It is important to note 
that this does not make certain that all of the programs had completely unrelated content as 
two shows from the same genre may have been included. However, the degree of overlap in 
content was likely to be considerably less than in the episodes within each subgenre. The 
appendix lists all of the programs from one of the three random sets. 

The sets of random episodes were made up of the same number of running words as 
each subgenre. To ensure that the random sets had the same number of running words as each 
of the three subgenres, one episode in each of the sets did not include all of its running words. 
The random sets were created with an aim of including as many running words as possible 
from the final episode in the set. Comparing corpora with an equal number of running words 
is necessary for a valid comparison (Nation & Webb, 2011). 

Words which were not spoken in the scripts such as stage directions, setting features, 
and speakers’ names were removed. Only the spoken words from the programs were 
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analysed. Contractions, connected speech, and hyphenated words were changed to conform 
with the spellings used in Nation’s (2004) British National Corpus (BNC) word lists. 
Contractions accounted for 0.39% of the tokens in the study (0.28% of the tokens in the six 
programs and 0.49% of the tokens from the random episodes). For example, s’pose, ol’, 
wanna, and kinda were changed to suppose, old, want to, and kind of, respectively. If the 
spellings were not changed these items would have been classified as being less frequent than 
the most frequent 14,000 word families. Flowever, it should be noted that knowing the 
changed spellings does not mean that the original forms would also be known. For example, 
learners may know old and want to but they might not recognize ol’ and wanna. Webb & 
Rodgers (2009a, 2009b) suggest that a high percentage of contractions may hinder the 
vocabulary comprehension and learning in speech. To the best of our knowledge no study has 
yet investigated the effect that the percentage of contractions and connected speech may have 
on comprehension and incidental learning. The relatively small percentage of items that were 
changed to conform to the spellings in the BNC lists would suggest that the percentage of 
contractions in the episodes, at least as indicated by the scripts, would be quite small. 

3.3. Software and word lists 

The RANGE software (Nation & Fleatley, 2002) was used to analyze the scripts. RANGE is a 
computer program which lists the words that occur in a text according to their frequency. 
Nation’s (2004) 14 lists' of 1,000 word families were used with the RANGE software to show 
the 1,000 word level (1,000-14,000) at which the words in the programs occurred. The lists 
were based on the frequency and range of occurrence of words in the BNC. The word families 
in the lists were categorized as Level 6 according to Bauer and Nation’s (1993) classification 
of word families. Level 6 word families include inflections and more than 80 derivational 
affixes. All word stems were free forms not bound forms. 1 Less frequent words than the most 
frequent 14,000 word families were classified by the RANGE program as proper nouns, 
marginal words (i.e. oh, uh, mmm, ah) and Not in the Lists. The proper nouns list has over 
13,000 entries, but this will rarely account for all of the proper nouns in an analysis of a 
corpus, and a number of proper nouns will be classified by RANGE as Not in the Lists (words 
less frequent than the most frequent 14,000 word families). Proper nouns found in the Not in 
the Lists were reclassified as proper nouns and added to the proper nouns totals. It should 
also be noted that several words such as: bartender, cheerleader, donut, and email, which 
were classified as Not in the Lists are likely to be known by learners with a vocabulary size of 
3,000 word families. This suggests that the coverage figures may be slightly conservative. 
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The RANGE program and the words lists are freely available at Paul Nation’s website: 
http://www.victoria.ac.nz/lals/staff/paul-nation/nation.aspx . 

If the most frequent 3,000 word families plus proper nouns and marginal words is 
sufficient for comprehension of television programs (Webb & Rodgers, 2009a), examining 
the low frequency vocabulary found in the 4,000-14,000 word lists which may be unknown to 
L2 learners may indicate the potential for incidental vocabulary learning. The number of 
times word families from these word lists were encountered may indicate whether unknown 
vocabulary is more likely to reoccur in related television programs than unrelated programs. 
Research on incidental vocabulary learning from reading indicates that from six (Rott, 1999) 
to 20 encounters (Waring & Takaki, 2003) may be needed to leam words with the amount of 
knowledge gained dependent on the contexts in which the words are encountered (Webb, 
2008). In corpus-driven studies, Nation and Wang (1999) used 10 or more encounters with 
unknown words as the number of encounters necessary for incidental vocabulary learning 
through reading, Cobb (2007) used six encounters, and Horst (2009) used 10 or more 
encounters for learning through listening. Because the number of encounters necessary for 
learning can vary from word to word (Webb, 2008), it is useful to look at different numbers of 
encounters with words. One or two encounters is likely to lead to gains in knowledge of form 
but minimal gains in knowledge of meaning, five to nine encounters may lead to partial 
knowledge of a number of aspects of knowledge, and 10 or more encounters with words may 
indicate a good chance of learning the meanings of words and other aspects of knowledge. 


4. RESULTS 

The cumulative coverage of the three subgenres, the random episodes with which they were 
matched, and the number of tokens, types, and families in each, is shown in Table 2. The last 
three rows of the table show the number of tokens, types and word families in each set of 
episodes. The episodes consisted of 1,330,268 tokens; 665,134 tokens were from the three 
subgenres (266,856,182,620, 215,658) and 665,134 were from the random episodes 
(266,856,182,620, 215,658). The number of word types found in the subgenres and the 
random episodes was inconsistent. The medical subgenre (11,688) and the spy/action 
subgenre (7,570) contained fewer word types than the matching random episodes (12,626 and 
10,233, respectively). However, the criminal forensic investigation subgenre (11,548) 
consisted of more word types than the matching random episodes (11,136). The results for 
word families were more consistent. The final row of Table 2 shows that there were 16% 
fewer word families in the medical subgenre (5,930) than in the matching set of random 
episodes (7,069), 25% fewer word families in the spy/action subgenre (4,450) than the 
matching set of random episodes (5,964), and 1% fewer word families in the criminal forensic 
investigation subgenre (6,356) than the matching set of random episodes (6,417). The 
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relatively large number of types and families in the criminal forensic investigation subgenre in 
comparison to the other two subgenres indicates that there may have been less overlap in 
content between the two programs in the criminal forensic investigation subgenre. 


Random 

matching 


Word list 

Medical 

subgenre 

Random 

matching 

medical 

subgenre 

Spy/action 

subgenre 

Random 

matching 

Spy/action 

subgenre 

Criminal 

forensic 

investigation 

subgenre 

criminal 

forensic 

investigation 

subgenre 

1,000 

88.29 

90.22 

91.71 

90.38 

88.08 

90.56 

2,000 

92.65 

94.42 

95.63* 

94.47 

92.92 

94.58 

3,000 

94.85 

96.21* 

96.85 

96.19* 

95.01* 

96.25* 

4,000 

95.92* 

97.27 

97.74 

97.27 

96.21 

97.28 

5,000 

96.65 

97.83 

98.28 

97.86 

97.02 

97.82 

6,000 

97.1 

98.29 

98.63 

98.31 

97.49 

98.24 

7,000 

97.42 

98.54 

98.77 

98.54 

97.81 

98.53 

8,000 

97.72 

98.74 

98.92 

98.74 

98.11 

98.74 

9,000 

97.94 

98.9 

99.02 

98.92 

98.31 

98.91 

10,000 

98.14 

99.05 

99.14 

99.09 

98.47 

99.05 

11,000 

98.33 

99.16 

99.22 

99.19 

98.63 

99.16 

12,000 

98.45 

99.25 

99.29 

99.27 

98.9 

99.24 

13,000 

98.53 

99.3 

99.34 

99.33 

98.99 

99.29 

14,000 

98.63 

99.33 

99.36 

99.36 

99.08 

99.35 

Proper nouns 

1.94 

2.53 

3.59 

2.48 

2.75 

2.77 

Marginal 

words 

0.87 

0.66 

0.41 

0.73 

0.73 

0.81 

Not in the 

lists 

1.38 

0.67 

0.64 

0.65 

0.93 

0.66 

Tokens 

266,856 

266,856 

182,620 

182,620 

215,658 

215,658 

Types 

11,688 

12,626 

7,570 

10,233 

11,548 

11,136 

Families 

5,930 

7,069 

4,450 

5,964 

6,356 

6,417 


Note. *reaching 95% coverage 

Table 2. The cumulative coverage including proper nouns and marginal words of programs within the 
subgenres and the random matching episodes calculated using RANGE. 


Table 2 also shows the cumulative coverage for the subgenres and the matching random 
programs. It is important to look at the cumulative coverage because it provides some 
indication of the vocabulary size necessary for comprehension as well as the vocabulary load; 
a higher proportion of high frequency words indicates a smaller lexical load. The table shows 
that the vocabulary size necessary to reach 95% coverage varied between the subgenres and 
the random programs. A vocabulary size of 4000 word families plus proper nouns and 
marginal words is necessary to reach 95.92% coverage of the medical programs. In contrast, a 
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vocabulary size of 3000 word families plus proper nouns and marginal words reaches 96.21% 
coverage in the random corpus matched with the medical subgenre. 

Viewers would need to know fewer words to reach 95% coverage of the spy/action 
subgenre than the random matching episodes. A vocabulary size of 2000 word families plus 
proper nouns and marginal words reaches 95.63% coverage for all of the episodes of 24 and 
Alias together. In contrast, a vocabulary size of 2000 word families plus proper nouns and 
marginal words reaches only 94.47% coverage in the random episodes matched with the spy 
subgenre. At the 3000 word level coverage was 96.85% for the spy subgenre. The criminal 
forensic investigation programs and the random episodes matched with it followed the same 
pattern as the medical programs; viewers would need to know more words to reach 95% 
coverage of the crime programs than the matched random episodes. A vocabulary size of 
3000 word families plus proper nouns and marginal words provided 95.01% coverage. In 
comparison, a vocabulary size of 3000 word families plus proper nouns and marginal words 
reaches 96.25% coverage of the random episodes. 

Table 3 shows the number and percentage of encounters with word families from the 
4,000-14,000 word lists in the subgenres and the random episodes. In this comparison 
between subgenres and random episodes it is important to look at the percentages rather than 
the number of encounters because the number of low frequency tokens varies. For example, 
there are more low frequency tokens in the medical subgenre (10,069) than in the random 
episodes matched with the medical subgenre (8,323) because the cumulative coverage was 
higher at the 3,000 word level for the random episodes. 

The results were consistent between the three subgenres and indicate that there is 
greater potential for incidental vocabulary learning through watching different episodes of 
programs from the same subgenre than watching random television programs. The percentage 
of words encountered once was lower in all three subgenres and the percentage of words 
encountered 10 or more times was higher in all three subgenres than in the matched random 
episodes. In the medical subgenre 46% of words were encountered once and 9% were 
encountered 10 or more times. In the random episodes matched with the medical subgenre 
49% of the words were encountered once and 4% were encountered 10 or more times. In the 
spy/action subgenre 54% of words were encountered once and 5% were encountered 10 or 
more times. In the random episodes matched with the spy/action subgenre 55% of the words 
were encountered once and 4% were encountered 10 or more times. In the criminal forensic 
investigation subgenre 49% of words were encountered once and 5% were encountered 10 or 
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more times. In the random episodes matched with the criminal forensic investigation subgenre 
54% of the words were encountered once and 3% were encountered 10 or more times. 













Random 

matching 





Random 




Random 


Criminal 


criminal 





matching 




matching 


forensic 


forensic 



Medical 


medical 


Spy/action 


Spy/Action 


investigation 

investigation 

Number of 

subgenre 


subgenre 


subgenre 


subgenre 


subgenre 


subgenre 


encounters 

Amount 

% 

Amount 

% 

Amount 

% 

Amount 

% 

Amount 

% 

Amount 

% 

1 encounter 

1,159 

46% 

1,408 

49% 

774 

54% 

1,256 

55% 

1,285 

49% 

1,340 

54% 

2 encounters 

469 

19% 

568 

20% 

258 

18% 

457 

20% 

479 

18% 

480 

19% 

3-4 













encounters 

398 

16% 

462 

16% 

205 

14% 

323 

14% 

397 

15% 

358 

14% 

5-7 













encounters 

197 

8% 

238 

8% 

89 

6% 

132 

6% 

245 

9% 

187 

7% 

8-9 













encounters 

74 

3% 

71 

2% 

35 

2% 

34 

1% 

63 

2% 

55 

2% 

10+ 













encounters 

227 

9% 

128 

4% 

78 

5% 

82 

4% 

139 

5% 

79 

3% 

Mean 

encounters 

for 10+ 

21 


18 


23 


18 


23 


20 


Total word 

families 

2,524 


2,875 


1,439 


2,284 


2,608 


2,499 


Total tokens 

10,069 


8,323 


4,580 


5,763 


8,754 


6,655 



Table 3. Number and percentage of encounters with 4,000-14,000 level word families in the subgenres and the 

matching random episodes calculated using RANGE. 


It is important to note that the number of words that were encountered 10 or more times 
in the episodes was less in the spy/action subgenre (78) than in the matching random episodes 
(82). This was probably due to the fact that there were 21% fewer low frequency tokens in the 
spy/action subgenre (4,580) than in the matched random episodes (5,763). The percentage of 
words encountered 10 or more times and the mean number of times those words were 
encountered in the spy programs indicates that even if there are fewer lower frequency words 
occurring in related programs, there is greater potential for learning those words than in 
random television programs. 


5. DISCUSSION 

The present study expanded upon recent corpus-driven research on television programs 
(Webb & Rodgers, 2009a), narrow viewing (Rodgers & Webb, in press), and narrow reading 
(Gardner, 2004; Hwang & Nation, 1989; Schmitt & Carter, 2000; Sutarsyah, Nation, & 


© Servicio de Publicaciones. Universidad de Murcia. All rights reserved. IJES, vol. 11 (1), 2011, pp. 117-135 







128 


Stuart Webb 


Kennedy, 1994) by looking at the vocabulary in pairs of television programs with related 
content in comparison to sets of random television programs. The analysis should provide 
some indication of the extent to which vocabulary is likely to reoccur within related programs 
and unrelated programs. It should also indicate the potential for incidental vocabulary learning 
within those sets. 

In answer to the first research question, the results indicated that television programs 
within the same subgenre are likely to have fewer word families than random television 
programs; however, the number of word types may vary between related and unrelated 
programs. Because research on the psychological reality of word families shows that knowing 
the stem of a word may facilitate recognition of its inflectional and derivational members 
(Bertram, Laine, & Virkkala, 2000; Nagy, Anderson, Schommer, Scott, & Stallman, 1989; 
Wysocki & Jenkins, 1987), the results for word families should provide an accurate 
assessment of the vocabulary load for watching television. In all three subgenres, the number 
of word families was less than in the sets of matched random episodes. There were 16% fewer 
word families in the medical subgenre, 25% fewer word families in the spy/action subgenre, 
and 1 % fewer word families in the criminal forensic investigation subgenre than in the sets of 
random matching episodes. There were also 7% fewer word types in the medical subgenre 
and 26% fewer word types in the spy/action subgenre, but 4% more word types in the 
criminal forensic investigation subgenre than in the random episodes matched with each 
subgenre. The relatively large number of types and families in the criminal forensic 
investigation subgenre indicates that there may have been less overlap in content and 
storylines between the two programs (CSI and Crossing Jordan), in comparison to the 
programs in the other two subgenres. This may also indicate that selecting programs with a 
high degree of overlap in content and storylines may be challenging. 

The findings are supported by previous research on narrow viewing which has shown 
that different episodes of the same program had a lower vocabulary load than random 
television programs (Rodgers & Webb, in press), and research on narrow reading which has 
shown that related texts have a lower vocabulary load than unrelated texts (Hwang & Nation, 
1989; Schmitt & Carter, 2000; Sutarsyah, Nation, & Kennedy, 1994). The results have 
pedagogical value because they indicate that related television programs are likely to have a 
lower vocabulary load than unrelated television programs, and may be easier for viewers to 
understand because a smaller range of vocabulary is likely to be found in television programs 
from within the same subgenre than in random television programs. 

The results of this study as well as previous studies (Rodgers & Webb, in press; Webb 
& Rodgers, 2009a) showed that the vocabulary size necessary to understand programs may 
vary between programs and genres. A vocabulary size of 2,000 word families was sufficient 
to reach 95% coverage of the spy/action subgenre, a vocabulary size of 3,000 word families 
was necessary to reach 95% coverage of the forensic criminal investigation subgenre and the 
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random sets of programs, and a vocabulary size of 4,000 word families was necessary to reach 
95% coverage of the medical subgenre. This means that while related programs may have 
fewer word families, it is still particularly important to select programs and genres for 
language learning which consist of a greater percentage of high frequency words if 
comprehension is a challenge for viewers. 

Programs from the medical subgenre and the criminal forensic investigation subgenre to 
a lesser extent may have a greater number of low frequency words because they contain more 
genre specific technical vocabulary. For example, in the medical subgenre the following are 
some of the technical words encountered 20 or more times: aneurysm, bacterium, biopsy, 
cardiac, diagnosis, hallucinate, lumbar, ovary, pulmonary, seizure, syndrome, toxin, 
tuberculosis, tumor, and vomit. There are two useful ways of dealing with these words. Pre- 
learning the 10 most frequent word families that are likely to be unknown has the potential to 
improve comprehension by increasing vocabulary coverage (Webb, 2010b). Pre-teaching 
words before viewing English language videos has been found to increase comprehension 
(Chung, 2002). Creating glossaries of these words is also likely to aid comprehension 
through increased coverage (Webb, 2010c). 

In answer to the second research question, the results indicated that word families are 
more likely to reoccur in programs from within the same subgenre than in random television 
programs. There was a higher percentage of low frequency (4,000-14,000 level) word families 
encountered 10 or more times in each of the sets of programs within the same subgenre than 
in the matched random episodes. The percentage of word families encountered 10 or more 
times ranged from 5% in both the spy/action and criminal forensic investigation subgenres to 
9% in the medical subgenre. In contrast, the percentages ranged from 3% in the random 
episodes matched with the criminal forensic investigation subgenre to 4% for the other two 
sets of matched random episodes. There was also either a higher or equivalent percentage of 
low frequency word families encountered 8-9 times, 5-7 times, and 3-4 times in the programs 
within the same subgenres. In contrast, there was a higher percentage of word families 
encountered once and twice in the matched random episodes than within the programs from 
the same subgenre. For example, 46% of the words in the medical subgenre, 54% of the 
words in spy/action subgenre, and 49% of the words in the criminal forensic investigation 
subgenre were encountered only once, whereas, the percentages were 49%, 55%, and 54%, 
respectively in the matched random episodes. Taken together, the results indicate that 
watching television programs with related content may be a more effective way of learning 
vocabulary than watching random programs because unknown words are more likely to 
reoccur in programs with related content. 

Research investigating incidental vocabulary learning through watching television 
indicates that both LI viewers (Oetting, Rice, & Swank, 1995; Rice & Woodsmall, 1988) and 
L2 viewers (d'Ydewalle & Pavakanun, 1995; d’Ydewalle & Van de Poel, 1999; Koolstra & 
Beentjes, 1999; Neuman & Koskinen, 1992; Pavakanun & d’Ydewalle, 1992) do incidentally 
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leam words through watching television. Although there has not been any research examining 
the number of encounters necessary to leam words incidentally through extensive viewing, it 
is likely that learning words through watching television is similar to learning words through 
reading. Research on incidental vocabulary learning through reading has shown that the more 
words are encountered in context, the more likely they are to be learned (Horst, Cobb, & 
Meara, 1998; Jenkins, Stein, & Wysocki, 1984; Rott, 1999; Saragi, Nation, & Meister, 1978; 
Waring & Takaki, 2003; Webb, 2007). The number of encounters necessary to leam words is 
likely to be a function of the amount of information present in the context in which the words 
are encountered (Webb, 2008). With reading this may relate primarily to the semantic 
relationships between words in the text. However, with television it may also be a function of 
the clarity of the discourse, the speed of the discourse, and the amount of semantic overlap 
between the imagery and the vocabulary. 

In answer to the third research question, the number of word families encountered 10 or 
more times within a full season of two programs from the same subgenre ranged from 78 in 
the spy/action subgenre to 227 in the medical subgenre for a total of 444 word families. The 
mean number of encounters for these items ranged from 21-23 indicating that these word 
families were encountered often within the subgenres. If we count the word families that were 
encountered 10 or more times as potentially learned items, then the results indicate that there 
is potential for significant incidental vocabulary through watching two seasons of television 
programs with related content. It is also useful to look at the number of word families 
encountered five or more times because five encounters with words in context may be 
sufficient to gain partial knowledge of these items. The number of word families encountered 
five or more times within a full season of two programs from the same subgenre ranged from 
202 for the spy/action subgenre to 498 for the medical subgenre for a total of 1147 word 
families. It is also important to note that viewers are also likely to gain knowledge of higher 
frequency words through watching television programs. Tests of vocabulary size may indicate 
that these words are known. However, it is unlikely that learners have full knowledge of these 
words. Further encounters with high frequency items in original contexts are likely to 
strengthen different aspects of vocabulary knowledge for those words. 

It is also useful to look at the potential for vocabulary learning through watching 
random television programs. The number of word families encountered 10 or more times 
within an amount of random television episodes that is equivalent to a full season of two 
programs ranged from 79 to 128 for a total of 289 word families. The mean number of 
encounters for these items ranged from 18-20 indicating that these word families were also 
encountered often within the random programs. The number of word families encountered 
five or more times in the matched random episodes ranged from 248 to 437 for a total of 1006 
word families. Although the results for the sets of random episodes are not quite as high as 
they are for the subgenres, the data for the random episodes indicates that there is great 
potential for incidental vocabulary learning through regular viewing of television programs. 
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Each of the sets consisted of approximately 34 hours of viewing time. If learners are able to 
understand and enjoy L2 television programs this figure may represent a relatively small 
amount of viewing time. 

It is important to note, however, that the amount of low frequency vocabulary is a 
function of the coverage at the 3000 word level. For example, in the medical subgenre, the 
vocabulary coverage was relatively low at the 3000 word level (94.85%). This means that 
there were a relatively large number of low frequency words found in these episodes. It also 
means that the degree of comprehension could be relatively low because viewers need to 
know more words to reach 95% coverage. Conversely, the random episodes matched with the 
medical subgenre had a much higher coverage at the 3,000 word level (96.21%). The higher 
coverage indicates that these episodes may be easier to understand because viewers need 
fewer words to reach 95% coverage. Together the results show that there may be a trade-off 
between comprehension and vocabulary learning. If there is higher coverage at the 3,000 
word level there may be better comprehension but there will be fewer encounters with low 
frequency items. On the other hand higher coverage may provide better conditions to leam 
words; there is likely to be greater understanding of the information in the contexts that can be 
used to leam unknown items (Liu & Nation, 1985). Further research examining the effects of 
coverage on vocabulary learning would be a useful follow-up to this study. 


6. CONCLUSION 

The present study provides some direction on how television might be effectively used. 
Watching L2 television programs is likely to be difficult at first. Initially the speed of the 
dialogue, the unfamiliar spoken form s of words that have only been encountered previously in 
text and the amount of spoken input may be overwhelming. If comprehension is challenging, 
it may be more effective to watch television programs with related content and storylines than 
programs with unrelated content. Watching similar programs is likely to reduce the lexical 
burden and may also increase background knowledge which may aid comprehension when 
viewing subsequent episodes with similar content. The primary aim when teaching with L2 
television programs should be to support comprehension because if viewers can understand 
L2 television programs they are more likely to watch them regularly. The findings in this 
study suggest that regular viewing of related programs may lead to large incidental 
vocabulary learning. 


NOTES 

1 For more information about the word lists see Nation (2004, 2006). 
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APPENDIX 

Random programs matched with medical dramas 


Program 

30 Rock 

Beverly Hills 90210 
Alf 

Aliens in America 
Ally McBeal 
American Dad 
Andromeda 
Andy Barker P.I. 

Back to You 
Big Day 
Carpoolers 
Criminal Minds 
Crusoe 

Desperate Housewives 
Dhanna Greg 
Eli Stone 

Everybody Hates Chris 

Everybody Loves Raymond 

Flash Gordon 

Friends 

Heroes 

In Justice 

Kidnapped 

Kyle XY 

Lois & Clark: The New Adventures of Superman 
Lost 

Men in Trees 
Monk 

My Name Is Earl 

New Amsterdam 

Numb3rs 

Pepper Dennis 

Private Practice 

Privileged 

Pushing Daisies 

Scrubs 

Seinfeld 

Shark 

Smallville 

Star Trek The Next Generation 
Stargate SG-1 

Studio 60 on the Sunset Strip 
The 4400 

The Black Donnellys 
The Dead Zone 
The L Word 
The Nanny 

The New Adventures Of Old Christine 
The O.C. 


Episode title 

Secrets and Lies 
Love Me or Leave Me 
Fever 

Rocket Club 
Boy Next Door 

The Most Adequate Christmas Ever 

The Honey Offering 

The Big No Sleep 

A Gentleman Always Leads 

Alice Can't Dance 

The Troubles 

Limelight 

Hour 9 - Name of the Game 

You Can't Judge A Book By It's Cover 

Daughter of the Bride of Finkelstein 

The Humanitarian 

Everybody Hates Cutting School 

Boys' Therapy 

Ebb and Flow 

The One With The Birthing Video 
Five Years Gone 
Side Man 
Bum Baby Bum 

The Future's So Bright I Gotta Wear Shades 

Chi of Steel 

Jughead 

Kiss and Don't Tell 

Mr. Monk Gets a New Shrink 

We've Got Spirit 

Golden Boy 

Structural Corruption 

Heiress Bridenapped -- Film at Eleven 

Crime and Punishment 

All About Defining Yourself 

Bad Habits 

My Lips Are Sealed 

The Andrea Doria 

Wayne's World 

Power 

Power Play 

Avenger 2.0 

The Friday Night Slaughter 
Suffer the Children 
A Stone of the Heart 
Grains of Sand 
Light My Fire 

Stop the Wedding I Want to Get Off 
Self-Esteem Tempura 
The Pot Stirrer 
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The Secret Life of the American Teenager 

The Simpsons 

The War at Home 

The West Wing 

Threshold 

Traveler 

Twin Peaks 

Two and a Half Men 

Ugly Betty 
Veronica Mars 
Women’s Murder Club 
Xena Warrior Princess 


Caught 

The Seemingly Never-Ending Story 

The Empire Spanks Back 

The Stonny Present 

Blood of the Children 

The Trader 

Episode Four 

Look At Me Mommy I'm Pretty 
Sisters on the Verge of a Nervous 
Breakdown 

Poughkeepsie Tramps & Thieves 
FBI Guy 

Lyre Lyre Hearts on Fire 
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